{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Class01\n",
    "* Download data from WRDS and read into Python\n",
    "* Use WRDS Python package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Download data from WRDS\n",
    "WRDS provides a user-friendly wbe interface. It is very intuitive and straightforward to download data from WRDS.\n",
    "\n",
    "Taking a sample of CRSP data as an example, we will download CRSP monthly data from 2010 to 2017 and read the sample data into Python. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Read data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "file_path = '/Users/ml/Google Drive/af/teaching/database/data/'\n",
    "msf_raw = pd.read_csv(file_path+'msf_2010_2017.txt',sep='\\t',low_memory=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PERMNO</th>\n",
       "      <th>date</th>\n",
       "      <th>SHRCD</th>\n",
       "      <th>EXCHCD</th>\n",
       "      <th>SICCD</th>\n",
       "      <th>NCUSIP</th>\n",
       "      <th>COMNAM</th>\n",
       "      <th>PERMCO</th>\n",
       "      <th>HSICCD</th>\n",
       "      <th>CUSIP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100129</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100226</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100331</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100430</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100528</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100630</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>29269V10</td>\n",
       "      <td>ENERGY INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100730</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "      <td>GAS NATURAL INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100831</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "      <td>GAS NATURAL INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10001</td>\n",
       "      <td>20100930</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "      <td>GAS NATURAL INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10001</td>\n",
       "      <td>20101029</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "      <td>GAS NATURAL INC</td>\n",
       "      <td>7953</td>\n",
       "      <td>4925</td>\n",
       "      <td>36720410</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PERMNO      date  SHRCD  EXCHCD SICCD    NCUSIP           COMNAM  PERMCO  \\\n",
       "0   10001  20100129   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "1   10001  20100226   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "2   10001  20100331   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "3   10001  20100430   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "4   10001  20100528   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "5   10001  20100630   11.0     2.0  4925  29269V10       ENERGY INC    7953   \n",
       "6   10001  20100730   11.0     2.0  4925  36720410  GAS NATURAL INC    7953   \n",
       "7   10001  20100831   11.0     2.0  4925  36720410  GAS NATURAL INC    7953   \n",
       "8   10001  20100930   11.0     2.0  4925  36720410  GAS NATURAL INC    7953   \n",
       "9   10001  20101029   11.0     2.0  4925  36720410  GAS NATURAL INC    7953   \n",
       "\n",
       "  HSICCD     CUSIP  \n",
       "0   4925  36720410  \n",
       "1   4925  36720410  \n",
       "2   4925  36720410  \n",
       "3   4925  36720410  \n",
       "4   4925  36720410  \n",
       "5   4925  36720410  \n",
       "6   4925  36720410  \n",
       "7   4925  36720410  \n",
       "8   4925  36720410  \n",
       "9   4925  36720410  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "msf_raw.iloc[:10,:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### WRDS Python package\n",
    "* You should have Python installed already.\n",
    "* Use **pip** to install:\n",
    "\n",
    "    *pip install wrds*\n",
    "\n",
    "* Then you should be able to import this package.\n",
    "\n",
    "[WRDS offers a guidance how to work Python with WRDS server](https://wrds-www.wharton.upenn.edu/pages/support/wrds-cloud/holding/python-programming-wrds/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Import Python package: wrds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import wrds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Build connection with WRDS server\n",
    "You need to type your WRDS username and password."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "db = wrds.Connection()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Check available libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['compb',\n",
       " 'compbd',\n",
       " 'compdcur',\n",
       " 'compg',\n",
       " 'compgd',\n",
       " 'compm',\n",
       " 'compmcur',\n",
       " 'compnad',\n",
       " 'compsamp_snapshot',\n",
       " 'compseg',\n",
       " 'compsegd',\n",
       " 'contrib',\n",
       " 'crsp',\n",
       " 'crsp_a_indexes',\n",
       " 'crsp_a_stock',\n",
       " 'crsp_q_indexhist',\n",
       " 'crspa',\n",
       " 'crspq',\n",
       " 'csmar',\n",
       " 'csmar_financial']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.list_libraries()[20:40]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Check available databases\n",
    "Each library contains different databases and we need to find out the location of required data within the library.\n",
    "\n",
    "For example, let's read CRSP monthly stock file. You may notice that we have **crsp**, **crspa** and **crspq**. All of them include monthly stock file and daily stock file, and the difference is the update frequency:\n",
    "* crsp: daily update\n",
    "* crspa: annual update\n",
    "* crspq: quarterly update\n",
    "\n",
    "> ! Note: **crspa** (annual update) dose not mean that the database only contains yearly stock price. *-annual update-* means that WRDS will update daily file and monthly file every year.\n",
    "\n",
    "You have to choose the one with access permission in your institution.\n",
    "\n",
    "We have annual update CRSP, therefore, we will read monthly stock file from **crspa**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['mseshares',\n",
       " 'msf',\n",
       " 'msf62',\n",
       " 'msfhdr',\n",
       " 'msfhdr62',\n",
       " 'msi',\n",
       " 'msi62',\n",
       " 'msia',\n",
       " 'msib',\n",
       " 'msic']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.list_tables('crspa')[140:150]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "> **msf** is the monthly stock file. Then we know the path of CRSP monthly stock file: **crspa** > **msf**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Check variables\n",
    "Before you read data, you can check number of observations and what variables in this database.\n",
    "\n",
    "The disadvantage compared to web platform is that you cannot check the definition directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Approximately 4464480 rows in crspa.msf.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>nullable</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cusip</td>\n",
       "      <td>True</td>\n",
       "      <td>VARCHAR(8)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>permno</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>permco</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>issuno</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>hexcd</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>hsiccd</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>date</td>\n",
       "      <td>True</td>\n",
       "      <td>DATE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>bidlo</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>askhi</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>prc</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>vol</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>ret</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>bid</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>ask</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>shrout</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>cfacpr</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>cfacshr</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>altprc</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>spread</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>altprcdt</td>\n",
       "      <td>True</td>\n",
       "      <td>DATE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>retx</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE PRECISION</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        name  nullable              type\n",
       "0      cusip      True        VARCHAR(8)\n",
       "1     permno      True  DOUBLE PRECISION\n",
       "2     permco      True  DOUBLE PRECISION\n",
       "3     issuno      True  DOUBLE PRECISION\n",
       "4      hexcd      True  DOUBLE PRECISION\n",
       "5     hsiccd      True  DOUBLE PRECISION\n",
       "6       date      True              DATE\n",
       "7      bidlo      True  DOUBLE PRECISION\n",
       "8      askhi      True  DOUBLE PRECISION\n",
       "9        prc      True  DOUBLE PRECISION\n",
       "10       vol      True  DOUBLE PRECISION\n",
       "11       ret      True  DOUBLE PRECISION\n",
       "12       bid      True  DOUBLE PRECISION\n",
       "13       ask      True  DOUBLE PRECISION\n",
       "14    shrout      True  DOUBLE PRECISION\n",
       "15    cfacpr      True  DOUBLE PRECISION\n",
       "16   cfacshr      True  DOUBLE PRECISION\n",
       "17    altprc      True  DOUBLE PRECISION\n",
       "18    spread      True  DOUBLE PRECISION\n",
       "19  altprcdt      True              DATE\n",
       "20      retx      True  DOUBLE PRECISION"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.describe_table('crspa','msf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Read data\n",
    "The combination of Python and WRDS server (powered by SQL) make data retrieve much more flexible and efficient (this requires basic knowledge of SQL).\n",
    "\n",
    "Another advantage is that we do not have to separate data downloading and future data analysis. We can code the data importing and data analysis in the same place."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "sql = \"select permno,cusip,date,prc,ret \\\n",
    "    from crspa.msf \\\n",
    "    where date between '2016-01-01' and '2016-06-30' \\\n",
    "    order by permno,date\"\n",
    "crsp_sample_1 = db.raw_sql(sql)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>permno</th>\n",
       "      <th>cusip</th>\n",
       "      <th>date</th>\n",
       "      <th>prc</th>\n",
       "      <th>ret</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10001.0</td>\n",
       "      <td>36720410</td>\n",
       "      <td>2016-01-29</td>\n",
       "      <td>8.32</td>\n",
       "      <td>0.116779</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10001.0</td>\n",
       "      <td>36720410</td>\n",
       "      <td>2016-02-29</td>\n",
       "      <td>7.86</td>\n",
       "      <td>-0.055288</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10001.0</td>\n",
       "      <td>36720410</td>\n",
       "      <td>2016-03-31</td>\n",
       "      <td>7.81</td>\n",
       "      <td>-0.006361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10001.0</td>\n",
       "      <td>36720410</td>\n",
       "      <td>2016-04-29</td>\n",
       "      <td>7.30</td>\n",
       "      <td>-0.055698</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10001.0</td>\n",
       "      <td>36720410</td>\n",
       "      <td>2016-05-31</td>\n",
       "      <td>7.14</td>\n",
       "      <td>-0.021918</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    permno     cusip        date   prc       ret\n",
       "0  10001.0  36720410  2016-01-29  8.32  0.116779\n",
       "1  10001.0  36720410  2016-02-29  7.86 -0.055288\n",
       "2  10001.0  36720410  2016-03-31  7.81 -0.006361\n",
       "3  10001.0  36720410  2016-04-29  7.30 -0.055698\n",
       "4  10001.0  36720410  2016-05-31  7.14 -0.021918"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "crsp_sample_1.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "sql = \"select permno,cusip,date,prc,ret, abs(prc)*vol as dvol \\\n",
    "    from crspa.msf \\\n",
    "    where date between '2016-01-01' and '2016-06-30' and abs(prc)*vol>1000000000 \\\n",
    "    order by permno,date\"\n",
    "crsp_sample_2 = db.raw_sql(sql)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>permno</th>\n",
       "      <th>cusip</th>\n",
       "      <th>date</th>\n",
       "      <th>prc</th>\n",
       "      <th>ret</th>\n",
       "      <th>dvol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>14593.0</td>\n",
       "      <td>03783310</td>\n",
       "      <td>2016-01-29</td>\n",
       "      <td>97.339996</td>\n",
       "      <td>-0.075242</td>\n",
       "      <td>1.236778e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>84398.0</td>\n",
       "      <td>78462F10</td>\n",
       "      <td>2016-01-29</td>\n",
       "      <td>193.720795</td>\n",
       "      <td>-0.049783</td>\n",
       "      <td>7.192758e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>84398.0</td>\n",
       "      <td>78462F10</td>\n",
       "      <td>2016-02-29</td>\n",
       "      <td>193.559998</td>\n",
       "      <td>-0.000830</td>\n",
       "      <td>5.653366e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>84398.0</td>\n",
       "      <td>78462F10</td>\n",
       "      <td>2016-03-31</td>\n",
       "      <td>205.520004</td>\n",
       "      <td>0.067212</td>\n",
       "      <td>4.774859e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>84398.0</td>\n",
       "      <td>78462F10</td>\n",
       "      <td>2016-04-29</td>\n",
       "      <td>206.330795</td>\n",
       "      <td>0.003945</td>\n",
       "      <td>3.942229e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    permno     cusip        date         prc       ret          dvol\n",
       "0  14593.0  03783310  2016-01-29   97.339996 -0.075242  1.236778e+09\n",
       "1  84398.0  78462F10  2016-01-29  193.720795 -0.049783  7.192758e+09\n",
       "2  84398.0  78462F10  2016-02-29  193.559998 -0.000830  5.653366e+09\n",
       "3  84398.0  78462F10  2016-03-31  205.520004  0.067212  4.774859e+09\n",
       "4  84398.0  78462F10  2016-04-29  206.330795  0.003945  3.942229e+09"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "crsp_sample_2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "> With Python connection to WRDS, we can also finish the job of merging databases without downloading data from web interface. Try it if you have interest."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Save data in local folder\n",
    "You can output the data if you want to save it locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "crsp_sample_2.to_csv(file_path+'python_retrieve.csv',index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
